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Volcano eruption forecast remains a challenging and con- 
troversial problem despite the fact that data from volcano 
monitoring significantly increased in quantity and quality 
during the last decades. This study uses pattern recognition 
techniques to quantify the predictability of the 15 Piton de 
la Fournaise (PdlF) eruptions in the 1988-2001 period using 
increase of the daily seismicity rate as a precursor. Lead 
time of this prediction is a few days to weeks. Using the 
daily seismicity rate, we formulate a simple prediction rule, 
use it for retrospective prediction of the 15 eruptions, and 
test the prediction quality with error diagrams. The best 
prediction performance corresponds to averaging the daily 
seismicity rate over 5 days and issuing a prediction alarm 
for 5 days. 65% of the eruptions are predicted for an alarm 
duration less than 20% of the time considered. Even though 
this result is concomitant of a large number of false alarms, 
it is obtained with a crude counting of daily events that are 
available from most volcano observatories. 



1. Introduction 

The effective prediction success of volcanic eruptions is 
rare if one defines "prediction" as a precise statement of 
time, place, and ideally the nature and size of an impend- 
ing activity [Minakami, 1960; Swanson et al., 1985; Voight, 
1988; Tilling and Lipman, 1993; Chouet, 1996; Mcnutt, 
1996]. A noteworthy obstacle is that most studies do not 
quantify the effectiveness and reliability of proposed pre- 
dictions, and often do not surpass the analysis of a unique 
success on a single case history with the lack of systematic 
description of forecasting results. In this study we focus 
on rigorous quantification of the predictive power of the in- 
crease in the daily seismicity rate — a well-known and prob- 
ably the simplest volcano premonitory pattern. 

Following Mtnakami [1960], Kagan and Knopoff [1987], 
Keilis-Borok [2002], we do not consider here deterministic 
predictions, and define a prediction to be "a formal rule 
whereby the available observable manifold of eruption oc- 
currence is significantly contracted and for this contracted 
manifold a probability of occurrence of an eruption is signif- 
icantly increased" [Kagan and Knopoff, 1987]. To quantify 
the effectiveness and reliability of such predictions we use 
error diagrams [Kagan and Knopoff, 1987; Molchan, 1997]. 

Previous attempts in probalistic forecast of volcanic erup- 
tions used seismicity data in combination with other obser- 
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vations or alone [Minakami, 1960; Klein, 1984; Mulargia et 
al., 1991, 1992]. These studies did not quantify the predic- 
tion schemes in the error diagram framework. Mtnakami 
[1960] was a pioneer in the development of seismic statistics 
method for volcano monitoring. Based on the data from the 
andesitic Asama volcano, Honshu, he uses the increase in 
five-day frequencies of earthquakes to derive an increase in 
the probability for an eruption in the next 5 days. Klein 
[1984] tests the precursory significance of geodetic data, 
daily seismicity rate, and tides before the 29 eruptions dur- 
ing 1959-1979 at the Kilauea volcano, Hawaii. He derives 
a probabilistic prediction scheme that applies for eruptions 
anywhere on the volcano and can give 1- or 30-days forecast. 
The forecasting ability of daily seismicity rate is shown to 
be better than random at 90% confidence in forecasts on the 
time scale of 1 or 30 days using small earthquakes that occur 
in the caldera. A better performance is achieved with a 99% 
confidence when using located earthquakes only, in forecasts 
on the time scale of 1 day. Mulargia et al. [1991, 1992] use 
regional seimicity to define clusters of seismic events within 
120 km distance of Etna volcano. Clusters within this re- 
gional seismicity are found within 40 days before 9 out of 
11 flank eruptions in the 1974-1989 period. On the same 
period no statistically significant patterns are identified 40 
days before and after the 10 summit eruptions. 

As a test site we choose the PdlF volcano, the most active 
volcano worldwide for the last decades with 15 eruptions in 
the 1988-2001 period. On this site the volcanic risk remains 
low because most of the eruptions are effusive and occurred 
in an area that is not inhabited. For the PdlF site, the in- 
crease in seismicity rate and an increase in deformation rate 
have been reported within a few hours prior to an eruption 
(e.g. [Lenat et al, 1989; Grasso and Bachelery, 1995; Sapin 
et al., 1996; Aki and Ferrazzini, 2000; Collombet et al., 2003; 
Lenat et al., 1989; Cayol and Cornet, 1998]. Although the 
deformation data are very efficient to locate the lava outflow 
vents from a few hours to minutes before the surface lava 
flow, there is not yet a long term catalog available to test 
how they can be used to forecast an eruption days to weeks 
in advance. 

In this study we quantify the predictability of the PdlF 
eruptions on the longer time scale of a few days to weeks prior 
to an eruption. Collombet et al. [2003] show that accelerat- 
ing seismicity rate weeks prior to the PdlF eruptions can be 
recovered on average using the superposed epoch analysis 
before numerous eruptions. Here we show that the increase 
of the daily seismicity rate is useful as well to forecast indi- 
vidual eruptions. This is achieved by rigorous quantification 
of the prediction performance by introducing error diagrams 
[Kagan and Knopoff, 1987; Molchan, 1997] to choose among 
competitive prediction strategies. 

2. Data 

The PdlF hot spot volcano is a shield volcano with an 
effusive erupting style due to low viscosity basaltic magma. 
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During 1988-2001 period the seismicity at the PdlF site re- 
mained low, with Mmax = 3.5, and was localized within a 
radius of a few km beneath the central caldera. Less than 
10% of these small events are located, most of them being 
only recorded by the three summit stations located 3 km 
apart from each other. Contrary to the Mauna Loa - Ki- 
lauea volcanic system, there is no seismically active flank 
sliding or basal faulting on the PdlF. Contrary to the Etna 
volcano, there is no tectonic interaction with neighboring 
active structures. Accordingly, the PdlF seismicity is one of 
the best candidates to be purely driven by volcano dynam- 
ics. This seismogenic volume is also thought to be the main 
path for the magma to flow from a shallow storage system 
toward the surface [Lenat and Bachelery, 1990; Sapin et al, 
1996; Bachelery, 1999; Aki and Ferrazim, 2000]. 

The PdlF seismicity catalog consists of data from the 
16 seismic stations [Sapin et al, 1996; Akt and Ferrazzini, 
2000]. During the May 1988- June 2001 period the geome- 
try and instrumental characteristics of the seismic network 
remained stable, with a magnitude detection threshold of 
0.5 [Collombet et al, 2003]. In this period 15 eruptions 
were seismically monitored. We use here the seismicity rate 
of the volcano tectonic (VT) events, excluding long period 
(LP) events or rockfall signals. The number of LP events at 
the PdlF site is insignificant compared to the number of VT 
events. For example, the eruption of 1998 was acconpanied 
by a single LP event 4 hours before the surface lava flow 
[Aki and Ferrazzini, 2000], and 2500 VT events had been 
recorded at that time. 



3. Synthesis of seismicity pattern before 
eruptions 



is no recurrent migration of seismicity during these crises 
[e.g. Sapin et al, 1996] we suggested, as proposed by Rubin 
et al. [1998], that damage is neither directly related to the 
dyke tip, nor does it always map the dyke propagation. It is 
the response to dike intrusion of parts of the volcano edifice 
that are close to failure [e.g. Grasso and Bachelery, 1995]. 

We synthesize the pre-emption seismicity rate on the 
PdlF volcano as a 3 step process (Figure 1). First, the seis- 
micity rate increases in average and it follows a power law 
10-15 days prior the eruption [Collombet et al, 2003]. This 
is reminiscent of the average foreshock patterns observed 
for earthquakes [Jones and Molnar, 1979; Helmstetter and 
Sornette, 2003]. As for earthquakes, we suggest that this 
pattern illuminates a local damage process rather than a 
macroscopic failure, the damage being localized within the 
magma storage system a few km below the volcano [e.g. 
Sapin et al, 1996]. This average acceleration is different 
from the acceleration proposed prior to each single eruption 
by Voight [1988], or individual large earthquakes [e.g. Bufe 
and Varnes, 1993]. The second phase is seismically mapped 
by a discontinuity in seismicity rate from a peak value < 20 
events/day to a > 2000 events/day constant rate (Figure 1). 
We suggest it corresponds to the onset of the magma flow 
outward of the storage system. The third phase is char- 
acterized by a constant strong seismicity rate during each 
crisis. We suggest it corresponds to the damage induced by 
fluid flow, either as a diffuse response to dyke propagation 
in an heterogeneous rock matrix or as damage in the open 
reservoir walls during fluid flow. 

This pre-emption scheme helps both to clarify the erup- 
tion phases on the PdlF and to define our prediction targets. 
If one uses a conventional definition of the target as the onset 



Although the peaks of seismicity rate clearly correlate 
with eruption days (see Figure 1 in [Collombet et al, 2003]), 
it is difficult to identify a long-term seismicity pattern before 
each eruption, except possibly during the last few hours be- 
fore surface lava flow [Lenat et al., 1989; Sapin et al., 1996; 
Aki and Ferrazzini, 2000; Collombet et al, 2003]. For all the 
15 PdlF eruptions the hourly seismicity rate during the seis- 
micity crisis that precedes each surface lava flow is roughly 
constant with values ranging from 60 to 300 events/hr, with 
an average value of 120 events/hr. The average crisis dura- 
tion is 4 hrs, the extreme values ranging from 0.5 hours for 
the may 1988 eruption to 36 hours for the 1998 eruption. 
No correlation is found between the seismic rates or the du- 
rations of the crisis and the erupted volumes. Because there 
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Figure 2. Prediction sclieme and prediction outcomes. 
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Figure 1. Average pre-eruptive pattern before a PdlF 
eruption. This behavior is obtained by averaging the seis- 
micity rate over the 15 eruptions during 1988-2001. 
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Figure 3. Error diagram: fraction of failures to predict 
as a function of alarm duration. The diagonal line corre- 
spond to a random prediction. Deviations from this line 
depict predictive power of the precursor. 
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time of surface lava flow, then all the eruptions can be pre- 
dicted a few hours in advance by choosing a daily seismicity 
rate larger than 60 events/day as an alarm threshold. For 
this threshold value the seismic crisis that did not end up in 
an eruption are false alarms. They are post-labelled at the 
observatory as intrusion, and are part of the endogeneous 
growth of any volcano. We aim to find precursory patterns 
before the outward magma flow from the reservoir system. 
Accordingly, we define our target as the onset of a reser- 
voir leak as mapped by the end of the average acceleration 
process and before the onset of the eruption crisis (Figure 
1). This target possibly maps a local failure in the reser- 
voir walls, contemporary to the onset of outward magma 
flow from the reservoir, and corresponds to predicting erup- 
tions more than one day in advance. Thus, our problem is 
difi'erent from that posed by Klein [1984]. 

4. Prediction scheme and error diagram 

Here we follow a pattern recognition approach [e.g. 
Gelfand et al, 1976] to predict rare extreme events in com- 
plex systems; this approach is reviewed by Keilis-Borok 
[2002]. To use pattern recognition techniques as a fore- 
casting tool we deflne 3 steps in the data analysis. First 
we consider a sequence of VT earthquake occurrence times 
C = te : e = 1 , 2 , . . . E; te < te+i ■ Note that we use neither 
magnitude nor location of events. Second, on the sequence 
C we define a function N(t, s) as the number of earthquakes 
within the time window [t — s,t\, s being a numerical pa- 
rameter. This functional is calculated for the time interval 
considered with different values of numerical parameter s. 
Third, an alarm is triggered when the functional N{t, s) ex- 
ceeds a predefined threshold A'o . The threshold A'^o is usually 
chosen as a certain percentile of the distribution function for 
the functional N{t,s). The alarm is declared for a time in- 
terval A. The alarm is terminated after an eruption occurs 
or the time A expires, whichever comes first. Our predic- 
tion scheme depends on three parameters: time window s, 
threshold A*'o, and duration A of alarms. The quality of this 
kind of prediction is evaluated with help of "error diagrams" 
which are a key element in evaluating a prediction algorithm 
[Kagan and Knopoff, 1987; Molchan, 1997]. 

The definition of an error diagram is the following. Con- 
sider prediction by the scheme described above. We con- 
tinously monitor seismicity, declare alarms when the func- 
tional N{t, s) exceeds the threshold, and count the predic- 
tion outcomes (Figure 2). During a given time interval T, N 
targets occurred and Np of them were not predicted. The 
number of declared alarms was A, with Ap oi them being 
false alarms. The total duration of alarms was D. The 
error diagram shows the trade-off between the relative du- 
ration of alarms r = D/T, the fraction of failures to predict 
n = Nf/N, and the fraction of false alarms / — Af/A. In 
the (n, r)-plane the straight line n -\- t = 1 corresponds to 
a random binomial prediction — at each step in time the 
alarm is declared with some probability r and not declared 
with probability 1 — r. Given a particular prediction that 
depends on our three parameters (s, A'^o, A), different points 
in the error diagram correspond to different values of these 
parameters. Error diagrams thus tally the score of a predic- 
tion algorithm's successes and errors. This score depends on 
the algorithm's adjustable parameters. For example, raising 
the threshold A'o will reduce the number of alarms A but 
may increase the number Nf of failures to predict. Raising 
A, on the other hand, will increase the duration alarms D 
but may reduce the number of failures to predict Nf, etc. 
A prediction algorithm is useful if: (i) the prediction qual- 
ity is better than that of a random one, i.e. the points on 
error diagram are close to the origin and distant from the 
diagonal n + t = 1; and (ii) this quality is fairly insensitive 
to changes in the parameters. 



5. Results and Discussion 

We estimate the time predictability of volcanic eruptions 
based on the increase of the daily seismicity rate. The pa- 
rameters of the algorithm are varied as follows: 1 < s < 30 
days, 1 < A'o < 100 events per s days, 1 < A < 30 days. 
The 30 day limit is the minimum time between two eruptions 
during 1988-2001. The best predictions are obtained when 
averaging seismicity rate over a 5 day window and declaring 
an alarm for 5 days. The predictive skills of our prediction 
scheme are illustrated by the error diagrams of Figures (3, 
4). Each point in the error diagram corresponds to different 
values of the threshold A'o ranging from 1 to 100 events per 5 
days, other parameters are flxed as s = 5 days, A = 5 days. 
Error diagrams outline the whole range of possible predic- 
tion outcomes; thus they are more convenient for decision 
making than performance of "the best" single version of pre- 
diction. We observe for instance (Point A) that 65% of the 
PdlF eruptions can be predicted with 20% of the time cov- 
ered by alarms. These results are of the same quality as that 
obtained on the Etna or the Hawaii volcanoes. For instance, 
using regional seismicity in a 120 km radius from the Etna 
volcano, 50% of the eruptions could have been predicted 
within 40 days in the 1974-1990 period, which can be sorted 
as 80 % of the 11 flank eruptions, and no summit eruptions 
[Mulargia et al., 1991, 1992]. Decreasing the threshold A'o 
yields an alternative prediction strategy that favors a lower 
failure to predict rate and accepts a higher alarm duration 
rate; it is shown as point B on Figure (3). The choice of 
a particular prediction strategy must be always based on 
the analysis of the entire error diagram; different prediction 
strategies may be used in parallel to complement each other 
(see more in [Molchan, 1997; Zaliapin et al., 2003]). 

It is worth noticing that the performance of our simple 
prediction algorithm, which is based on mere averaging of 
the seismicity rate, is close to the performance of much more 
sophisticated algorithms that use numerous seismic param- 
eters to predict large observed earthquakes [e.g. Kossobokov 
et al., 1999]. The significant predictability we obtain here is 
still concomitant of a fraction of false alarm larger than 90% 
(Figure 4). Because this predictability emerges from the use 
of a daily seismicity rate only, we expect that a modifica- 
tion of the above prediction strategy to include earthquake 
location and magnitudes with deformation and geochemistry 
data will improve this first quantitative analysis of eruption 
prediction on PdlF. 
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Figure 4. Error diagram: fraction of false alarm as a 
function of alarm duration. The point at 20% of alarm 
rate as deduced from Figure (3) correspond to a 90% 
false alarm rate. 
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